For many of us Bayesian Statistics is a dark voodoo magic at best or a completely subjective nonsense at worst. Here we will try to show how simple and elegant Bayesian Statistics can be and motivate why you should use it.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/BayesianStatistics.png')
Bayesian Statistics can be very useful when you would like to skip thinking about:
To answer this question you need to check how much Data you have.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/AmountOfData.png')
If we have N samples and P features in our data, we should definitely use Bayesian Statistics when P >> N, i.e. when number of features is much greater than number of samples. This is a typical setup for many biological projects and Frequentist analysis can not handle this situation ("underpowered study") without severe overfitting. As the data amount grows it makes more and more sense to use Frequentist and Deep Learning approaches.
Should you use Bayesian Statistics if you have Big Data? You might want to do it because the Big Data can be biased. Semi-successful tests of self-driving cars gave a rise to Bayesian Deep Learning.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/AleatoricUncertainty.jpg')
The main idea: to let the machine know how much it does not know through Uncertainty. The more a machine/person realizes how much it/he/she does not know the more Intelegent the machine/person is. Alex Kendall, one of the Deep Learning gurus, excellently describes the need for Uncertainty in Deep Learning in his blog. He describes the difference between Epistemic and Aleatoric uncertainties and shows how to implement them in the Loss function for a Neural Network.
Briefly, Epistemic Uncertainty is due to limitations of the training data set, while Aleatoric Uncertainty is related to impossibility of a Neural Network to correctly quintify the information, like on the picture above the self-driving car could not detect a vehicle in front due to a special sun light at dusk. Another great post describes how to practically implement Bayesian Deep Lerning in Python
Sometimes you really feel that there is something wrong with the Frequentist Statistics, i.e. you really start feel uncomfortable. Here there are a few examples:
%%latex
\begin{align}
\rm{Can \,we \,claim \,that:\qquad}
0.05 \approx \frac{12}{250} < \frac{2}{15} \approx 0.13
\end{align}
Here you need a Bayesian Inference to overlap two credibility intervals
David Robinson in his blog shows how incorporation of empirical Prior information helps to compare batting averages between two baseball players.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/EmpiricalBayes.png')
Here you need a Bayesian Inference which is based on conditional probabilities and hence adresses causality
Think that Bayesian Statistics is based on conditional probabilities and P(A|B) is not necessarily eqial to P(B|A), this automatically gives you causal relation between A and B, like in case of Bayesian Network below that correctly figures out how Diabetes-related factors cause each other:
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/bn_diab.png')
Here you need a Bayesian Inference which automatically gives you a credibility interval
The only thing you can do in the frameworks of classical/Frequentist statistics to estimate the uncertainty is to bootstrap. However, bootstrap is nothing more than just a sampling from Posterior distribution as Rasmus Bååth very elegantly explains.
Here you need a Bayesian Inference to have less false positive discoveries
Andrew Gelman in his fantastic blog post very elegantly explains how Bayesian Statistics provides a 0 probability to make a claim with confidence in case of noisy data while Frequentist Statistics still have 5% of False Positives.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/claims_with_confidence.png')
There are many reasons to be cautious when applying Frequentist Statistics. It is heavily based on normality assumtion and is sensitive to outliers. Thus operating with simple discriptive statistics which do not always reflect the underlying distribution Frequentist Statistics fails to correctly capture the difference in data at the Anscombe's quartet below when all the data sets have identical linear regression slopes but very different data distributions:
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/Anscombes_quartet.png')
In contrast, Bayesian Statistics operates with Posterior distributions (i.e. a Posterior distribution is a primary output of Bayesian Inference) which are more correct representations of the data. There are a few more famous examples called "DataSaurus" which demonstrate that Frequentist Statistics can not capture the difference between groups of samples with identical desriptive statistics such as mean, standard deviation or Pearson's correlation coefficient.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/DataSaurus.gif.png')
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/BoxViolinSmaller.gif.png')
For instance, association studies essentially compare means and standard deviations between groups of samples and therefore miss hidden differences.
So what is the main difference between Bayesian and Frequentist Statistics? Bayesian and Frequentist Statistics differ in their definition of probability.
In this section we will address a question: "What would be the Body Mass Index (BMI) of a typical Swedish citizen based on the data we have?" using solely Frequentist Statistics. For this purpose we will use the BMI data from the Malmö Offspring Study (MOS) project of PI Marju Orho-Melander from Lund University.
import pandas as pd
phen = pd.read_csv('/home/nikolay/WABI/M_Orho_Mellander/FromLouise/Age_sex_BMI_WABI2017.csv')
phen.fillna(phen.mean(), inplace = True)
phen.head()
If we consider N BMI data points, in order to calculate the BMI of a typical citizen of Sweden we will need to maximize the likelihood to observe the N BMI data points simultaneously, this is so-called Maximum Likelihood Estimate (MLE) technique. We assume the likelihood to have the Gaussian shape with mean $\mu$ and standard deviation $\sigma$. Therefore $\mu$ is nothing else as the BMI of a typical Swedish citizen:
%%latex
\begin{align}
\rm{L}\,(\,\rm{BMI_i} \,|\, \mu,\sigma\,) =
\prod_{i=0}^{N}\frac{1}{\sqrt{2\pi\sigma²}} \exp^{\displaystyle -\frac{(BMI_i-\mu)^2}{2\sigma²}}
\end{align}
Maximizing the probability will automatically result (please check) in a well known equations for mean and standard deviation:
%%latex
\begin{align}
\mu = \frac{1}{N}\sum_{i=0}^N \rm{BMI_i}\\
\sigma^2 = \frac{1}{N}\sum_{i=0}^N (\rm{BMI_i}-\mu)^2
\end{align}
Let us randomly select 3 data points, i.e. N = 3 and visualize the likelihood:
import random
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
bmi_random=random.sample(list(phen['BMI']),3)
print("We selected the following BMI data points: {}".format(bmi_random))
bmi_mean = np.array(bmi_random).mean()
if len(bmi_random) < 2:
bmi_sigma = bmi_mean
else:
bmi_sigma = np.array(bmi_random).std()
print("Mean BMI value: {}".format(bmi_mean))
print("BMI Standard Deviation: {}".format(bmi_sigma))
x = np.linspace(bmi_mean - 3*bmi_sigma, bmi_mean + 3*bmi_sigma, 100)
plt.figure(figsize=(20,15))
plt.plot(x, mlab.normpdf(x, bmi_mean, bmi_sigma))
plt.plot(bmi_random,list(0 for i in range(0,len(bmi_random))),'ro')
plt.ylabel('Frequency', fontsize = 20)
plt.xlabel('BMI', fontsize = 20)
plt.legend(['Likelihood','Data Points'], loc='upper right', fontsize = 20)
plt.show()
To start using Bayes rule we will have to have a prior probability. We select a Gaussian prior with mean=35 and standard deviation sd=4. We obviously have just moved to Sweden from the US and think that a typical BMI of people is 35. Another reason might be that we have heard a lot about obesity growing world-wide and exaggerate its prevalence. Let us display our prior:
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (20,15)
import random
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
bmi_random=random.sample(list(phen['BMI']),3)
print("We selected the following BMI data points: {}".format(bmi_random))
bmi_mean = np.array(bmi_random).mean()
if len(bmi_random) < 2:
bmi_sigma = bmi_mean
else:
bmi_sigma = np.array(bmi_random).std()
print("Mean BMI value: {}".format(bmi_mean))
print("BMI Standard Deviation: {}".format(bmi_sigma))
x = np.linspace(bmi_mean - 3*bmi_sigma, bmi_mean + 3*bmi_sigma, 100)
plt.plot(x, mlab.normpdf(x, bmi_mean, bmi_sigma))
plt.plot(bmi_random,list(0 for i in range(0,len(bmi_random))),'ro')
prior_mean = 35
prior_sigma = 4
x = np.linspace(prior_mean - 3*prior_sigma, prior_mean + 3*prior_sigma, 100)
plt.plot(x, mlab.normpdf(x, prior_mean, prior_sigma))
plt.ylabel('Frequency', fontsize = 20)
plt.xlabel('BMI', fontsize = 20)
plt.legend(['Likelihood','Data Points','Prior'], loc='upper right', fontsize = 20)
plt.show()
Prior is one of the corner stones of Bayesian Statistics. Its primary purpose is to incorporate an expert knowledge into a scientific problem. But the concept of Prior is much deeper than just that. Prior adds a regularization to the Maximum Likelihood Estimate (MLE) in case of small sample size or biased data. Imagine we want to know what would be a typical BMI in Sweden of a person of a certain Age and certain Sex. For this purpose we will model that BMI depends linearly on Age and Sex:
%%latex
\begin{align}
\rm{BMI_i} \sim N(\mu_i,\sigma) \\
\mu_i = \alpha + \beta_1 \rm{Age_i} + \beta_2 \rm{Sex_i} \\
\rm{L}\,(\,\rm{BMI_i} \,|\, \mu_i,\sigma\,) =
\prod_{i=0}^{N}\frac{1}{\sqrt{2\pi\sigma²}} \exp^{\displaystyle -\frac{(BMI_i-\mu_i)^2}{2\sigma²}}
\end{align}
Now if you have just 3 data points like in the example above, you are likely not be able to fit MLE Linear Regression without severe overfitting since you have 4 fitting parameters: $\alpha$, $\beta_1$, $\beta_2$ and $\sigma$. For this purpose, in Frequentist Statistics one can use a Ridge regilarizaion, where you put a constraint on the coefficeints of explanatory variables like this:
%%latex
\begin{align}
\rm{BMI_i} \sim N(\mu_i,\sigma) \\
\mu_i = \alpha + \beta_1 \rm{Age_i} + \beta_2 \rm{Sex_i} \\
\rm{L}\,(\,\rm{BMI_i} \,|\, \mu_i,\sigma\,) =
\prod_{i=0}^{N}\frac{1}{\sqrt{2\pi\sigma²}}
\exp^{\displaystyle -\frac{(BMI_i-\mu_i)^2}{2\sigma²} + \lambda(\beta_1^2 + \beta_2^2) }
\end{align}
However including regularization is nothing more than including a Prior in order to correct the MLE model. So why the hell didn't we use a Prior from the beginning? One apparently needs to hit into overfitting problems with Frequentist Statistics in order to realize that using Prior assumptions is very helpful.
Now it is time to calculate the Posterior distribution from the Likelihood and the Prior. Since we asumed both the Likelihood and the Prior to have Gaussian shape, it is easy to calculate the posterior analytically. This is the case of so-called conjugate prior, i.e. the Likelihood and the Prior have the same shape. Here we will randomly draw 20 BMI values and will add them one-by-one and monitor how the Posterior changes when we add more and more data:
import warnings
warnings.filterwarnings('ignore')
import math
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (20,15)
from IPython import display
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import random
random.seed(12345678)
bmi_random_sample = random.sample(list(phen['BMI']),100)
for i in range(1,21):
bmi_random = bmi_random_sample[0:i]
bmi_mean = np.array(bmi_random).mean()
if len(bmi_random) < 2:
bmi_sigma = bmi_mean
else:
bmi_sigma = np.array(bmi_random).std()
fig = plt.figure(1)
axes = plt.gca()
axes.set_xlim([10,50])
axes.set_ylim([-0.001,0.16])
#THIS IS THE LIKELIHOOD
x = np.linspace(bmi_mean - 3*bmi_sigma, bmi_mean + 3*bmi_sigma, 100)
plt.plot(x, mlab.normpdf(x, bmi_mean, bmi_sigma))
plt.plot(bmi_random,list(0 for i in range(0,len(bmi_random))),'bo')
plt.title(str(i) + ' Data Points', fontsize = 20)
#THIS IS THE PRIOR
prior_mean = 35
prior_sigma = 4
x = np.linspace(prior_mean - 3*prior_sigma, prior_mean + 3*prior_sigma, 100)
plt.plot(x, mlab.normpdf(x, prior_mean, prior_sigma))
#THESE ARE THE PARAMETERS OF THE POSTERIOR DISTRIBUTION AN THE POSTERIOR PLOT
posterior_sigma = math.sqrt((math.pow(prior_sigma,2)*math.pow(bmi_sigma,2))/(math.pow(prior_sigma,2)
+ math.pow(bmi_sigma,2)))
posterior_mean = (bmi_mean*math.pow(prior_sigma,2) + prior_mean*math.pow(bmi_sigma,2)) / (math.pow(prior_sigma,2)
+ math.pow(bmi_sigma,2))
x = np.linspace(posterior_mean - 3*posterior_sigma, posterior_mean + 3*posterior_sigma, 100)
plt.plot(x, mlab.normpdf(x, posterior_mean, posterior_sigma))
plt.ylabel('Frequency', fontsize = 20)
plt.xlabel('BMI', fontsize = 20)
plt.legend(['Likelihood', 'Data Points', 'Prior', 'Posterior'], loc='upper right', fontsize = 20)
fig.savefig('/home/nikolay/WABI/Misc/figs/DataPoints_' + str(i) + '.png')
plt.gcf().clear()
So we have generated 20 png-plots for 20 Posteriors. Now we will merge them to a gif-file and rename to png in order to be able to insert into Jupyter.
%%bash
FILE=/home/nikolay/WABI/Misc/figs/movie.gif.png
if [ -f $FILE ]; then
rm $FILE
fi
convert -delay 200 -loop 0 $(ls -v /home/nikolay/WABI/Misc/figs/*.png) /home/nikolay/WABI/Misc/figs/movie.gif
mv /home/nikolay/WABI/Misc/figs/movie.gif /home/nikolay/WABI/Misc/figs/movie.gif.png
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/figs/movie.gif.png')
Here we observe that with only a few data points, the Posterior looks very much like the Prior. Adding more and more data shifts the posterior towards the Likelihood, i.e. the Prior becomes less and less dominant, it still has its regularizing function though.
Usually of course you do not have conjugate priors and you do not calculate the Posterior analitically. Often it is impossible to do analitically. A standard way to calculate a Posterior in Bayesian Inference is to use Stan, a special language for Probabilistic Programming. Stan has its own syntax, where you define your model rather than do a traditional programming. Stan usually has 3 blocks: model, data and parameters, out of them only model is compulsary.
with open('/home/nikolay/WABI/Misc/bmi.stan') as file:
print(file.read())
After you have defined your model and saved it as a *.stan file, you can call the Stan model from R ("rstan") or Python ("pystan"). Remember to feed in the data into the model before calling it.
import rpy2
%load_ext rpy2.ipython
%%R
sink("temp.txt")
library("rstan")
library("animation")
my_bmi<-na.omit(read.csv("/home/nikolay/WABI/M_Orho_Mellander/FromLouise/Age_sex_BMI_WABI2017.csv"))$BMI
random_sample<-my_bmi[sample(length(my_bmi),25)]
saveGIF({
for(N_data_points in 5:25)
{
my_bmi<-random_sample[1:N_data_points]
stan.data <- list(N = length(my_bmi),
bmi = my_bmi)
stan_output <-stan(file = "/home/nikolay/WABI/Misc/bmi.stan",
data= stan.data,
iter = 1000,
warmup = 500,
chains = 2, verbose = FALSE)
#PRIOR
mean=35; sd=4
x <- seq(-4,4,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
plot(x, hx, type="l", col="red",ylim=c(0,0.3),xlim=c(15,50),ylab="Frequency",xlab="BMI")
mtext(paste0(N_data_points," data points"))
#LIKELIHOOD
lines(density(my_bmi),col="blue")
# get points at axes
points(my_bmi,rep(0,length(my_bmi)), col="blue", pch=16)
#POSTERIOR
p <- extract(stan_output, "bmi_exp")$bmi_exp
lines(density(p))
legend("topright",legend=c("Prior","Likelihood","Posterior"),col=c("red", "blue","black"),lty=c(1,1,1),inset=0.02)
}
})
print(stan_output)
summary(stan_output)
file.rename("/home/nikolay/WABI/Misc/Bayesian_Inference/animation.gif",
"/home/nikolay/WABI/Misc/Bayesian_Inference/animation.gif.png")
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/animation.gif.png')
Thus we get a simiar plots for Prior, Likelihood and Posterior. The only difference now is that the Posterior was calculated in Stan rather than analytically. The way Stan calculates the Posterior is called Markov Chain Monte Carlo (MCMC) algorithm.
Markov Chain is a memoryless random process, i.e. where the next step depends only on the previous step. Monte Carlo is a way to approximate some parameter by random sampling.
Markov Chain Monte Carlo (MCMC) is a method to approximate the posterior distribution of a parameter of interest by finding a region in the parameter space (Markov Chain) of highest posterior distribution and random sampling (Monte Carlo) in this region
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/mcmc.png')
What is Monte Carlo simulation? This can be illustrated by approximating the area of a complex geometry by random throwing points and counting the fraction of points withon the geometry.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/MC.png')
What is Markov Chain? It is a way to model a sequence of events where the next event depends only on the previous one, the next position of a segment of a polymer chain is determined by the position of the previous segment.
from IPython.display import Image
Image('/home/nikolay/WABI/Misc/Bayesian_Inference/FloryLattice.png')
Let us demonstrate how MCMC works in Python by using PyMC3. Here we will make the example with finding a typical BMI a bit more complex. In the previous section we found what a typical BMI of a Swedish citizen would be, and found it to be around 27 based on just a few observations. However, we have heard that older people should have more problems with their weight compared to youner people. I have just checked my weight and found it to be 83.7 kg while I am 1.83 m tall, therefore my BMI is exactly equal to 25, so my BMI seems to below a mean BMI in Sweden, so I can be happy. But wait, maybe this is a lot for a 37 years old person like me? So probably I should have asked not just "what is a typical BMI in Sweden?" but "what would be a typical BMI in Sweden for people of my age, i.e. 37 years old?". So let us plot BMI vs. Age curve from the data from MOS project of Marju Orho-Melander from Lund University and first fir the Maximum Likelihood Linear Regression curve and then consider Bayesian Liear Regression. Again we will not use all the data but just select 26 random BMI and Age points:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
phen = pd.read_csv('/home/nikolay/WABI/M_Orho_Mellander/FromLouise/Age_sex_BMI_WABI2017.csv')
phen.fillna(phen.mean(), inplace = True)
phen = phen.sample(frac = 0.01)
N = phen.shape[0]
print("""We selected {0} random points for analysis""".format(N))
Let us plot BMI vs Age and fit a classical MLE Linear Regression:
plt.figure(figsize=(20,15))
plt.style.use('ggplot')
import matplotlib.pyplot as plt
plt.scatter(phen['age_scr'],phen['BMI'], color = 'b')
plt.xlabel('AGE',fontsize = 20)
plt.ylabel('BMI',fontsize = 20)
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(phen['age_scr'].reshape(-1,1), phen['BMI'])
plt.plot(phen['age_scr'], lr.intercept_ + lr.coef_[0]*phen['age_scr'], color = 'r')
plt.title('BMI = ' + str(lr.intercept_) + ' + ' + str(lr.coef_[0]) + '*AGE', fontsize=20)
plt.show()
We can get the slope and the intercept from the Maximum Likelihood Estimate (MLE):
%%latex
\begin{align}
P(\alpha,\beta,\sigma) =
\prod_{i=0}^{N}\frac{1}{\sqrt{2\pi\sigma²}} \exp^{\displaystyle -\frac{(BMI_i-\alpha-\beta Age_i)^2}{2\sigma²}}
\end{align}
def neg_normal_curve(parameters):
alpha = parameters[0]
beta = parameters[1]
sigma = parameters[2]
y = phen['BMI']
x = phen['age_scr']
return -np.prod((1/np.sqrt(2*np.pi*(sigma**2)))*np.exp(-(y-alpha-beta*x)**2/(2*(sigma**2))))
import numpy as np
from scipy import optimize
parameters_guess = [19, 0.15, 5]
parameters_est = optimize.fmin(neg_normal_curve, parameters_guess)
print("""
Maximum Likelihood Estimate:
alpha = {parameters_est[0]:.5f}, beta = {parameters_est[1]:.5f}, sigma = {parameters_est[2]:.5f}
""".format(parameters_est = parameters_est))
Now how would you do it with Bayesian Inference? For this purpose we will use PyMC3 Python module:
import pymc3 as pm
basic_model = pm.Model()
with basic_model:
# Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10)
sigma = pm.HalfNormal('sigma', sd=1)
# Expected value of outcome
mu = alpha + beta*phen['age_scr']
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu = mu, sd = sigma, observed = phen['BMI'])
start = pm.find_MAP(model=basic_model)
step = pm.Metropolis()
# draw 20000 posterior samples
trace = pm.sample(20000, step=step, start=start)
_ = pm.traceplot(trace)
plt.show()
alpha_b = np.mean(trace['alpha'][:-1000])
beta_b = np.mean(trace['beta'][:-1000])
sigma_b = np.mean(trace['sigma'][:-1000])
print("""
Bayesian Estimate:
alpha = {parameters_est[0]:.5f}, beta = {parameters_est[1]:.5f}, sigma = {parameters_est[2]:.5f}
""".format(parameters_est = [alpha_b, beta_b, sigma_b]))
plt.figure(figsize=(20,15))
import seaborn as sns
sns.kdeplot(trace['alpha'],trace['beta'])
plt.xlabel('ALPHA', fontsize = 20)
plt.ylabel('BETA', fontsize = 20)
plt.style.use('ggplot')
plt.scatter(trace['alpha'],trace['beta'], color = 'b', s = 1)
plt.show()
plt.figure(figsize=(20,10))
plt.subplot(121)
plt.hist(trace['alpha'], histtype = 'step', lw = 4, bins=100, color = 'r')
plt.xlabel('ALPHA', fontsize = 20)
plt.ylabel('NUMBER OF SAMPLES', fontsize = 20)
plt.title('POSTERIOR OF ALPHA', fontsize = 20)
plt.subplot(122)
plt.hist(trace['beta'], histtype = 'step', lw = 4, bins=100, color = 'b')
plt.xlabel('BETA', fontsize = 20)
plt.ylabel('NUMBER OF SAMPLES', fontsize = 20)
plt.title('POSTERIOR OF BETA', fontsize = 20)
plt.show()
To check convergence of the MCMC we plot the autocorrelation function:
pm.autocorrplot(trace)
plt.show()
plt.figure(figsize=(20,15))
for alpha_posterior, beta_posterior in zip(trace['alpha'][:-1000],trace['beta'][:-1000]):
plt.plot(phen['age_scr'], alpha_posterior + beta_posterior*phen['age_scr'],alpha=.3)
plt.xlabel('AGE',fontsize=20)
plt.ylabel('BMI',fontsize=20)
plt.title('Lines Sampled from Posterior')
plt.scatter(phen['age_scr'],phen['BMI'], color = 'b')
plt.plot(phen['age_scr'], alpha_b + beta_b*phen['age_scr'], color = 'r', lw = 4)
plt.show()
def bmi_at_age(age):
trace_bmi = []
for alpha_posterior, beta_posterior in zip(trace['alpha'][:-1000], trace['beta'][:-1000]):
trace_bmi.append(alpha_posterior + beta_posterior*age)
bmi_b_mean = np.mean(trace_bmi)
bmi_b_std = np.std(trace_bmi)
return bmi_b_mean, bmi_b_std
age = 37
print("BMI at Age = {} Years Old Should Be: {} +/- {}".format(age, bmi_at_age(age)[0], 2*bmi_at_age(age)[1]))